fix(tesseract): multi_stage reduce_by no-ops on type:time dimensions#10855
Open
tlangton3 wants to merge 1 commit into
Open
fix(tesseract): multi_stage reduce_by no-ops on type:time dimensions#10855tlangton3 wants to merge 1 commit into
tlangton3 wants to merge 1 commit into
Conversation
A bare reduce_by/group_by (or grain exclude/keep_only) entry referencing a time dimension never matched the projected TimeDimensionSymbol, whose full_name carries a granularity suffix (_day, _month, ...). The entry was silently ignored: the time dimension stayed in (or vanished from) the window PARTITION BY / the JOIN-model leaf grain, collapsing window aggregates to one-row partitions. The granularity-aware matching lives on MultiStageGrain itself (partition_dimensions): entries with an explicit granularity match strictly on full_name; bare entries match across granularities by unwrapping both sides to their base symbol. Both consumers — the window path's member_partition_by_logical and the JOIN-model's partition_filter — now delegate to it, removing the duplicated body and discharging the FIXME about the two copies drifting. Fixes cube-js#10854.
fe7d04e to
640bc61
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #10854. A bare
reduce_by: [<time_dimension>](orgroup_by, or thegrain:directive'sexclude/keep_only) silently does nothing when the query projects that time dimension with a granularity.Problem
Projected time dimensions are
TimeDimensionSymbols whosefull_namecarries a granularity suffix (created_at_day,created_at_month, ...), and the suffix survives the whole reference chain. A bare grain entry compiles to a plainDimensionSymbol(orders.created_at), so thefull_name-based matching in the planner never matches it — the entry is silently ignored, the time dimension stays in (or vanishes from) the partition, and window aggregates collapse to one-row partitions. Sums per day instead of running totals, ranks of 1 everywhere — no error, no warning.This bites at two sites: the window path's
PARTITION BYcomputation, and the JOIN-model path's leaf-grain shaping (partition_filter), which carried a FIXME warning the two copies would drift if only one was updated.Fix
The granularity-aware matching now lives on
MultiStageGrainitself (partition_dimensions()), and both sites delegate to it — removing the duplicated body and discharging that FIXME. The matching rule:created_at.month) matches strictly onfull_name—group_by: [created_at.month]keeps only the month projection, not sibling granularities;The asymmetry is deliberate: an earlier draft that unwrapped unconditionally broke queries projecting multiple granularities of the same time dimension, which the multi-granularity test below now pins.
Because the legacy
reduce_by/group_byand the newgrain:directive compile through the same grain construction, the fix covers both spellings.Testing
Four integration tests (executed against postgres via testcontainers, result-level snapshots):
reduce_by_time_dim— bare reduce_by on the window path; fails without the fix (per-month splits instead of per-status totals)reduce_by_time_dim_join_model— same bug via atype: maxmeasure, which is not window-eligible and exercises thepartition_filtersite; fails without the fixgrain_exclude_time_dim_join_model— bare time-dim exclude spelled via thegrain:directive with a non-emptyinclude(forces the JOIN-model path); fails without the fixgroup_by_time_dim_multiple_granularities— guard test pinning the strict-match behaviour for explicit granularities; passes with and without the fix by designTwo adjacent issues are deliberately out of scope and can be follow-ups: granularity-qualified entries in filter directives (
filter: exclude: [created_at.month]) match nothing for date-range filters, andgrain.includeof a bare time dimension can add a duplicate grouping next to an already-projected granularity.